Palantir Technologies

VAST 2009 Challenge
Challenge 1: Badge and Network Traffic

Authors and Affiliations

Palantir Technologies – VAST09 Team
Brandon Wright, Palantir Technologies, bwright@palantirtech.com
Jason Payne, Palantir Technologies
Matt Steckman, Palantir Technologies

Tools

Overview: Palantir is a platform for collaborative, all-source analysis and operations, enabling geospatial, social-network, temporal, statistical, and structured and unstructured analysis. Palantir provides flexible tools to import and model data, intuitive constructs to search against this data, and powerful techniques to iteratively define and test hypotheses. Our platform is most highly valued for:

Quick integration of Enterprise data sources. Why should analysts have to query each data source individually?
Simple, intuitive search and discovery. Why should analysts have to understand schemas and query languages?
Extensibility. Palantir is highly adaptable through extensibility, enabling new integrations and visualizations to be developed in a matter of hours, not days.
Open and interoperable with other toolkits. Many analysts have an established set of tools they rely on, which is why Palantir was built to interoperate.
Collaboration. Palantir facilitates collaboration across users, groups, and agencies.

Background: Palantir is operational today at many of the most prestigious intelligence, defense, law enforcement, and regulation/oversight organizations in the world. Palantir was put together by the founders of PayPal, capitalizing on the lessons learned by their anti-fraud department. Facing highly coordinated cyber attacks in order to commit payment fraud and exploit sensitive consumer information, an entirely new approach was required. Existing technology was poorly suited to dealing with sparse, cyber-specific data. To defeat the international fraud rings, high level conceptual access to the data was required. The analyst-driven intelligence analysis tools that eventually became the Palantir platform were a direct outgrowth of this effort.

Company Web site:
http://www.palantirtech.com

Check out our Analysis Blog to see more analysis using Palantir: http://www.palantirtech.com/government/analysis-blog

Video

Palantir_MC1_Video.wmv

Answers

MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.

Traffic.txt

MC1.2: Characterize the patterns of behavior of suspicious computer use.

Summary

Based on analysis of the MC1 data in the Palantir platform, we believe embassy employee 30 most likely is the malicious insider who transmitted embassy data to the outside criminal organization. We identified 18 probable instances of 30 using 12 different embassy computers to make unauthorized Data Transmissions to IP address 100.59.151.133. These data transmissions, all over port 8080, involved very large payload requests and occurred one to three times per day every Tuesday and Thursday over a four week period on computers in vacant offices.

We spent about three hours preparing and running the automated data import into Palantir, and the analysis and completion of the following workflows took about one and a half hours.

Data Preparation and Import into Palantir

Prior to analyzing the three MC1 datasets in Palantir, we first prepared the platform and data for import. To begin, we used Palantir’s Dynamic Ontology Manager to build an ontology to accurately model the MC1 data. For instance, we created proximity badge events and added a number of properties to Data Transmission event objects, such as payload request and response size. Next we imported the classified space prox-in/prox-out timestamps paired together so the “Prox-event Classified” events would have a duration of the entire time the associated employee was in the classified space.

In Palantir, the data import process is quite simple. The user adds a file or database to the import wizard, which allows the user to map columns in the data file to properties based on the chosen object type. The import wizard then automatically imports the data, creating objects with their respective properties, linking objects as specified, and resolving duplicate objects based on customizable resolution rules.

Investigational Hypotheses

One of Palantir’s strengths as an analytical platform is the ability to integrate different types of data into a single investigational environment. Simply searching for suspicious network traffic can be an impossible task, but by correlating proximity badge events with data transmissions we could easily identify unauthorized data transmissions and continue investigating from there. We developed a set of logical questions to ask about the data and designed workflows to answer those questions. We first focused our investigation by looking for “piggybacking” events involving the classified space. With all 4080 “Prox-event Classified” events on the graph, we identified five open-ended piggybacking events using Palantir’s timeline. Selecting these open-ended on the timeline, we pulled these events from the group, and linked them to the associated employees: 30, 38, and 49.

Employee 30 had suspicious pattern of multiple piggybacking events where he did not prox-in to the classified space but did prox-out on the mornings of January 10, 17, and 24. Based on this suspicious behavior, we investigated his officemate's computer for unauthorized Data Transmissions. Viewing all of the officemate’s Data Transmission and prox events in the timeline, we indeed found a Data Transmission that occurred while the employee was in the classified space. Highlighting that event in the Timeline, we can find it on the Graph and see in the Selection Helper that the Data Transmission is to IP address 100.59.151.133.

We then used Palantir’s Search Around application, which searches for specified target objects by property or link type based on an initial object selection, to search for other embassy computers that connected with the suspicious IP address. We found 18 Data Transmissions from 12 different embassy computers.

Investigational Workflow

Working from these 18 Data Transmissions, we devised workflows to answer the following questions:

Was the authorized computer user present during the Data Transmission?
Was the officemate present during the Data Transmission?
What employees had access to a computer during all 18 Data Transmissions?

The workflow involved bringing all 18 Data Transmissions to the graph, along with the associated computers, the authorized users, and the users’ officemates. From there we performed a Search Around to individually bring each employee’s associated events to the graph. We then used the Timeline, which can differentiate objects by color, to visually identify Data Transmissions that were probably unauthorized based on the computer owner being (1) in the classified space, (2) out of the building, or (3) possibly away from the computer based on the temporal pattern of other events.

Finally, we created temporal filters for all 18 Data Transmissions and searched for all overlapping Prox-event Classified events, which we then linked to the associated employees in order to exclude them as possible suspects. Based on this search, we were left with 30 as our prime suspect. While verifying these suspicious transmissions, we were also able to see that there were no other unauthorized transmissions on these individuals’ computers.

Patterns of Suspicious Usage

Of the 18 Data Transmissions to the suspicious IP address, eight occurred while the assigned user was in or just exiting the classified space, six occurred while the user probably was out of the office, and four occurred during a period of no other network activity on the computer, which could plausibly indicate the user was, say, away in an unclassified meeting. Only employee 30 was definitely present during an officemate’s suspicious Data Transmission and so either saw or is the malicious insider. In six other cases suspicious Data Transmissions the respective officemate was in the classified space during the transmission, and in the remainder of cases the officemate probably was outside the building or otherwise away from the desk.

Viewing the 18 transmissions in the Time Wheel helper, we can see that they occur on one to three times per day on Tuesdays and Thursdays over a four week period. We also see that the 18 transmissions used port 8080, were among the largest payload requests during the month, and that the 18 transmissions had the largest payload requests of all port 8080 network traffic. Although 30 had some Data Transmissions in close proximity to the unauthorized transmission, we feel that the evidence points to employee 30 as the malicious insider.